AITopics | structured feedforward layer

Collaborating Authors

structured feedforward layer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Building on Efficient Foundations: Effective Training of LLMs with Structured Feedforward Layers

Neural Information Processing SystemsMay-26-2025, 15:38:05 GMT

State-of-the-art results in large language models (LLMs) often rely on scale, whichbecomes computationally expensive. This has sparked a research agenda to reducethese models' parameter counts and computational costs without significantlyimpacting their performance. Our study focuses on transformer-based LLMs,specifically targeting the computationally intensive feedforward networks (FFNs),which are less studied than attention blocks. We consider three structured linearparameterizations of the FFN using efficient low-rank and block-diagonal matrices.In contrast to many previous works that examined these approximations, our studyi) explores these structures from a training-from-scratch perspective, ii) scales upto 1.3B parameters, and iii) is conducted within recent Transformer-based LLMsrather than convolutional architectures. We demonstrate that these structures canlead to actual computational gains in various scenarios, including online decodingwhen using a pre-merge technique.

large language model, natural language, structured feedforward layer, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback